Source code: https://github.com/developmentseed/agu2021-python-data-viz
Live slides: http://devseed.com/agu2021-python-data-viz/
Credit to Jim Bednar, Filipe Fernandes, Philipp Rudiger and Vincent Sarago who developed much of the content in these tutorials.
HoloViz is a set of compatible tools to make it easier to see and understand your data at every stage needed by users, research groups, and projects:
Why "Holo"? "holo-", from the Greek root "hólos", means "whole, entire, complete".

To address the above issues, we have developed a set of open-source Python packages to streamline the process of working with small and large datasets (from a few datapoints to billions or more) in a web browser, whether doing exploratory analysis, making simple widget-based tools, or building full-featured dashboards. The main libraries in this ecosystem include:
Beyond the specific HoloViz tools, all these approaches work with and often rely upon a wide range of other open-source libraries for their implementation, including:
In this tutorial, we'll focus on an examples set of data using it to illustrate how to:
The tutorial is organized around the most general to the most specific, in terms of tool support. We look at Panel package, which works with nearly any plotting library and hvPlot, which works with nearly any data library and shares an API with many other plotting libraries, and then dive deeper into HoloViz-specific approaches that let you work with large data, provide deep interactivity, and other advanced features.
Before going further, it's worth exploring some examples of what you can get with HoloViz, to make sure that it covers your needs:
And then you can browse through the already-run versions of the HoloViz tutorials to see what they cover and how it all fits together. But everything on this website is a Jupyter Notebook that you can run yourself, once you follow the installation instructions, so the next step is then to try it all out and have fun exploring it!
If your data is in a Pandas dataframe, it's natural to explore it using the .plot() method (based on Matplotlib). Let's look at a dataset of the number of cases of measles and pertussis (per 100,000 people) over time in each state:
import pandas as pd
# pertussis is whoopoing cough
df = pd.read_csv('data/diseases.csv.gz')
df[1000:1005]
| Year | Week | State | measles | pertussis | |
|---|---|---|---|---|---|
| 1000 | 1947 | 13 | Alabama | 4.93 | 2.24 |
| 1001 | 1947 | 14 | Alabama | 9.96 | 3.50 |
| 1002 | 1947 | 15 | Alabama | 6.39 | 1.29 |
| 1003 | 1947 | 16 | Alabama | 12.03 | 2.86 |
| 1004 | 1947 | 17 | Alabama | 10.37 | 4.08 |
Just calling .plot() won't give anything meaningful, because it doesn't know what should be plotted against what:
%matplotlib inline
df.plot();
But with some Pandas operations we can pull out parts of the data that make sense to plot:
import numpy as np
by_year = df[["Year","measles"]].groupby("Year").aggregate(np.sum)
by_year.plot();
Here it is easy to see that the 1963 introduction of a measles vaccine brought the cases down to negligible levels.
The above plots are just static images, but if you import the hvplot package, you can use the same plotting API to get fully interactive plots with hover, pan, and zoom in a web browser:
import hvplot.pandas # noqa: adds hvplot method to pandas objects
by_year.hvplot()
Here the interactive features are provided by the Bokeh JavaScript-based plotting library. But what's actually returned by this call is something called a HoloViews object, here specifically a HoloViews Curve. HoloViews objects display as a Bokeh plot, but they are actually much richer objects that make it easy to capture your understanding as you explore the data:
import holoviews as hv
vline = hv.VLine(1963).opts(color='black')
m = by_year.hvplot() * vline * \
hv.Text(1963, 27000, " Vaccine introduced", halign='left')
m
While still always being able to access the original data involved for further analysis:
print(m)
m.Curve.I.data.head()
:Overlay .Curve.I :Curve [Year] (measles) .VLine.I :VLine [x,y] .Text.I :Text [x,y]
| Year | measles | |
|---|---|---|
| 0 | 1928 | 16924.34 |
| 1 | 1929 | 12060.96 |
| 2 | 1930 | 14575.11 |
| 3 | 1931 | 15427.67 |
| 4 | 1932 | 14481.11 |
For other plotting libraries, a given visualization that you construct is a dead end -- if you want to change it in some way, you'll need to reconstruct it from scratch with different settings.
Because HoloViews objects preserve your original data, you can now do more with your data than you could before, including anything you could do with the raw data, plus overlaying (as above), laying out in subfigures, slicing, sampling, setting options, and many other operations.
For instance, with HoloViews it's simple to break down the data in different ways. You can inspect each state individually:
measles_agg = df.groupby(['Year', 'State'])['measles'].sum()
by_state = measles_agg.hvplot('Year', groupby='State', width=500, dynamic=False)
by_state * vline
Or pull out a couple of those to put side by side:
by_state["Texas"].relabel('Texas') * vline + by_state["New York"].relabel('New York') * vline
Or to compare four states over time by overlaying:
states = ['New York', 'New Jersey', 'California', 'Texas']
measles_agg.loc[1930:2005, states].hvplot(by='State') * vline
Or by faceting:
measles_agg.loc[1930:2005, states].hvplot('Year', col='State', width=400, height=200, rot=90) * vline
Or as a different type of plot, such as a bar chart:
measles_agg.loc[1980:1990, states].hvplot.bar('Year', by='State', rot=90)
Or with additional information, such as error bars:
df_error = df.groupby('Year').agg({'measles': [np.mean, np.std]}).xs('measles', axis=1)
df_error.hvplot(y='mean') * hv.ErrorBars(df_error, 'Year').redim.range(mean=(0, None)) * vline
If we really want to invest a lot of time in making a fancy plot, we can customize it to try to show all the yearly data about measles at once:
heatmap = df.hvplot.heatmap('Year', 'State', 'measles', reduce_function=nansum,
logz=True, height=500, width=900, xaxis=None, flip_yaxis=True, clim=(1, np.nan))
aggregate = hv.Dataset(heatmap).aggregate('Year', np.mean, np.std)
agg = hv.ErrorBars(aggregate) * hv.Curve(aggregate).opts(xrotation=90)
agg = agg.options(height=200, show_title=False)
marker = hv.Text(1963, 800, u'\u2193 Vaccine introduced', halign='left')
(heatmap + (agg * marker).opts(width=900)).cols(1)
If you prefer, you can choose matplotlib to render your HoloViews plots, though you give up the interactive pan, zoom, and hover from Bokeh.
As you can see, these tools make it very quick to explore your data in a browser, and if you choose HoloViews+Bokeh plots, you can have full interactivity with very little code even for quite complex datasets.
Panel is designed to make it simple to add interactive controls to your existing plots and data displays, simple to build apps for your own use in a notebook, simple to deploy apps as standalone dashboards to share with colleagues, and seamlessly shift back and forth between each of these tasks as your needs evolve. If there is one thing you should take away from this tutorial, it's Panel!
Throughout this tutorial we will use a wave heights dataset collected by NOAA, so will start by loading it:
from load_data import *
import panel as pn
pn.extension()
df = load_data()
print(df.shape)
df.head()
(230988, 7)
| station | latitude | longitude | time | wvht | wspd | gst | |
|---|---|---|---|---|---|---|---|
| 0 | 41001 | 34.675 | -72.698 | 2021-01-01T00:40:00Z | 2.14 | 10.0 | 12.8 |
| 1 | 41001 | 34.675 | -72.698 | 2021-01-01T01:40:00Z | 2.23 | 10.6 | 12.9 |
| 2 | 41001 | 34.675 | -72.698 | 2021-01-01T02:40:00Z | 2.07 | 10.6 | 13.3 |
| 3 | 41001 | 34.675 | -72.698 | 2021-01-01T03:40:00Z | 1.97 | 9.2 | 11.6 |
| 4 | 41001 | 34.675 | -72.698 | 2021-01-01T04:40:00Z | 1.94 | 9.2 | 11.3 |
Before we get into the details of how Panel allows you to render and layout objects we will dive straight in and use Panel's interact function, modeled on the similar function in ipywidgets, to get a simple interactive app immediately. For instance, if you have a function that returns a row of a dataframe given an index, you can very easily make a panel with a widget to control the row displayed.
def select_row(row=0):
row = df.loc[row].to_frame()
return row.style.format({"time": lambda t: t.strftime("%c")})
pn.interact(select_row, row=(0, len(df)-1))
This approach can be used for any function that returns a displayable object, calling the function whenever one of the parameters of that function has changed.
In the spirit of "shortcuts, not dead ends", let's see what's in the object returned by interact:
app = pn.interact(select_row, row=(0, len(df)-1))
print(app)
Column
[0] Column
[0] IntSlider(end=230987, name='row')
[1] Row
[0] HTML(Styler, name='interactive73847')
interact¶interact has constructed a Column panel consisting of one Column of widgets (with one widget), and one Row of output (with one HTML pane). This object, once created, is a full compositional Panel object, and can be reconfigured and expanded with additional content if you wish, without breaking the connections between widgets and values:
pn.Column("## Choose a row", pn.Row(app[0], app[1]))
Hopefully from this simple example you can see the sorts of things Panel can do. In the rest of this section we'll cover some of the items you can use in a panel and how to compose them. In the subsequent section we will dive into how to set up widgets and their relationships explicitly, and then build a custom dashboard as an exercise. For now, we won't show code for any particular plotting library, but if you have a favorite one already, you should be able to use it with Panel in the exercises.
Before we start building more interactive apps, we will learn about the three main types of components in Panel:
If you ever want to discover how a particular component works, see the reference gallery.
The fundamental concept behind Panel is that it transforms the objects you give it into a viewable object that can be composed into a layout and updated dynamically. In this tutorial we will be building a dashboard visualizing a dataset of waves, so let us start by displaying a title using the pn.panel function:
title = pn.panel('## Major Waves Dashboard')
title
# top 5 waves in July 2021
df_sorted = df.sort_values(by=['wvht'], ascending=False)
df_sorted.head()
| station | latitude | longitude | time | wvht | wspd | gst | |
|---|---|---|---|---|---|---|---|
| 107281 | 46071 | 51.155 | 179.001 | 2021-01-01T00:50:00Z | 17.68 | NaN | NaN |
| 107283 | 46071 | 51.155 | 179.001 | 2021-01-01T02:50:00Z | 17.55 | NaN | NaN |
| 107284 | 46071 | 51.155 | 179.001 | 2021-01-01T03:50:00Z | 17.30 | NaN | NaN |
| 107286 | 46071 | 51.155 | 179.001 | 2021-01-01T05:50:00Z | 16.12 | NaN | NaN |
| 107285 | 46071 | 51.155 | 179.001 | 2021-01-01T04:50:00Z | 15.94 | NaN | NaN |
The pn.panel function attempts to find the most appropriate representation for different objects whether it is a string, an image, or even a plot. So if we provide the location of a PNG file instead as a path or a URL, the panel function will automatically infer that it should be rendered as an image:
noaa_logo = pn.panel('assets/noaa-lrg.png', height=130)
noaa_logo
The appropriate representation is resolved using a set of precedences, so it may sometimes be necessary to explicitly declare the type of Pane that is required. For example, if we want to display some HTML, which cannot easily be distinguished from Markdown, we can explicitly declare it by specifying the HTML Pane type from the pn.pane module:
pn.pane.HTML('<marquee width=500><b>Breaking news</b>: Major waves off coast of Rat Islands</marquee>')
In addition to Pane objects, Panel provides Panel objects that allow laying out components. The principal layouts are by Row or Column. These components act just like a regular list in Python:
column = pn.Column(title, noaa_logo, app)
column
Panels may be nested arbitrarily to construct complex layouts. Internally, Panel will call the pn.panel function on any objects which are not already a known component type, making it easy to lay out objects without explicitly wrapping them in a panel component, though wrapping it explicitly can help ensure that it is the type you expect:
df_top5 = pd.DataFrame(df_sorted[0:5], columns=['station', 'time', 'wvht'])
row = pn.Row(column,
pn.Column('### Top 5', pn.panel(df_top5, width=500)))
row
In the previous section we learned the very basics of working with Panel. Specifically we looked at the different types of components, how to update them and how to serve a Panel application or dashboard. However to start building actual apps with Panel we need to be able to add interactivity by linking different components together. In this section we will learn how to link widgets to outputs to start building some simple interactive applications.
In this section we will once again make use of the wave heights dataset we loaded previously and compute some statistics.
pn.interact constructs widgets automatically that can then be reconfigured, but if you want more control, you'll want to instantiate widgets explicitly. A widget is an input control that allows a user to change a value using some graphical UI. A simple example is a RangeSlider:
wvht_filter = pn.widgets.RangeSlider(name='Wave Heights', start=0, end=df['wvht'].max())
wvht_filter
The widget value is a Parameter that is set to a tuple of the selected upper and lower bound. Parameters are an extended type of Python attribute that declare their type, range, etc. so that other code can interact with them in a consistent way. When we change the range using the widget the value parameter updates, and vice versa if you change the value parameter manually:
wvht_filter.value
(0, 17.68)
The depends API is still a very high level way of declaring interactive components. Panel also supports the more low-level approach of writing callbacks in response to changes in some parameter, e.g. the value of a widget. All parameters can be watched using the .param.watch API, which will call the provided callback with an event object containing the old and new value of the widget.
Now that it is loaded we will create a slider which we will eventually use to select the row of the dataframe that we want to display.
row_slider = pn.widgets.IntSlider(value=0, start=0, end=len(df)-1)
Next we create a Pane to display the current row of the dataframe with times formatted nicely:
row_pane = pn.panel(df.loc[row_slider.value])
row_pane
Now that we have defined both the widget and the object we want to update we can declare a callback to link the two. As we learned in the previous section assigning a new value to the object of a pane will update the display. In the callback we select the row of the dataframe and then assign it to the pane.object.
def df_callback(event):
row_pane.object = df.loc[event.new]
Lastly we actually have to register this callback. To do so we provide the callback and the parameter we want to trigger the event on the slider's .param.watch method:
row_slider.param.watch(df_callback, 'value')
Watcher(inst=IntSlider(end=230987), cls=<class 'panel.widgets.slider.IntSlider'>, fn=<function df_callback at 0x17a229dc0>, mode='args', onlychanged=True, parameter_names=('value',), what='value', queued=False, precedence=0)
Now that everything is connected up we can put both the widget and the pane in a panel and display them:
pn.Column(row_slider, row_pane, width=400)
As you can see, this process is slightly more laborious than pn.interact or even the pn.depends approach, but doing it in this way should help you see how everything fits together and can be useful to more precisely control callbacks that update particular parameters or the contents of a larger layout.
HoloViz is a modular suite of tools, and when you need capabilities not handled by Bokeh and HoloViews (and optionally hvPlot) as above, you can bring those in:
We'll look at a dataset of earthquakes on a map.
import dask.dataframe as dd
import datashader as ds
from colorcet import palette
from holoviews.element.tiles import EsriImagery
topts = hv.opts.Tiles(width=700, height=600, bgcolor='black',
xaxis=None, yaxis=None, show_grid=False)
tiles = EsriImagery().opts(topts)
earthquakes = dd.read_parquet('data/earthquakes.parq', engine='fastparquet').persist()
colormaps = {n: palette[n] for n in ['fire','bgy','bgyw','bmy','gray','kbc']}
x, y = ds.utils.lnglat_to_meters(earthquakes.longitude, earthquakes.latitude)
projected_earthquakes = earthquakes.assign(x=x, y=y).persist()
import hvplot.dask
def view(cmap=colormaps['fire'], alpha=1, reverse_colormap=False):
cmap = cmap if not reverse_colormap else cmap[::-1]
return tiles.opts(alpha=alpha) * projected_earthquakes.hvplot.points(
'x', 'y', datashade=True, cmap=cmap
)
view()
As you can see, you can specify geospatial plots easily and if your HoloViews objects are too big to visualize in a browser directly, you can add datashade() to render them into images dynamically on zooming, etc.
NOTE: HoloViews includes support for basic web-based map tiles as used here, but if you need to work flexibly with different geographic projections, you'll want to install GeoViews as well. See the notebook on Geographic Data for more information.
You can also easily add widgets to control filtering, selection, and other options interactively, either here in the notebook or by putting the same code in a separate file and running it as a standalone server:
import panel as pn
explorer = pn.interact(view, cmap=colormaps, alpha=(0, 1.), reverse_colormap=False)
pn.Row(pn.Column('# Earthquake Explorer', explorer[0]), explorer[1]).servable()
Let's see it in action by uncommenting the following line:
# !panel serve earthquakes-dashboard.py
Here we used the Panel interact function to create a simple app based on the view function, and then we mixed and matched some of its components to lay it out in rows and columns as you see above.
In this simple app, the view function is called whenever any of the parameters change (alpha, colormap, or location), triggering a full rerender, but you can get a more responsive interface if you take the time to declare which computations depend on which parameters (see the Deploying Bokeh Apps tutorial).
Either way, the app should work the same here in the notebook (if you have a live Python process) or as a standalone server by calling panel serve with either the name of a Python file with the above code or simply the name of this notebook (where it will run the notebook code and serve any objects marked .servable()).)
Cloud-optimized GeoTIFFs paired with a dynamic tiler API provide the ability to view geospatial imagery at both high and low resolutions. This demo steps through how to use a COG and dynamic tiler API to view a single COG on a map.
TiTiler has other capabilities for rendering COGs, such as visualizing many COGs at once using mosaicJSON. Learn more about Titiler via the documentation website: https://developmentseed.org/titiler/
https://developmentseed.org/titiler/examples/notebooks/Working_with_CloudOptimizedGeoTIFF_simple/
For this demo we will use the DigitalGlobe OpenData dataset https://www.digitalglobe.com/ecosystem/open-data
import json
import httpx
from folium import Map, TileLayer
%pylab inline
titiler_endpoint = "https://titiler.xyz" # Developmentseed Demo endpoint. Please be kind.
url = "https://opendata.digitalglobe.com/events/mauritius-oil-spill/post-event/2020-08-12/105001001F1B5B00/105001001F1B5B00.tif"
Populating the interactive namespace from numpy and matplotlib
/Users/aimeebarciauskas/miniconda3/envs/python-dataviz/lib/python3.9/site-packages/IPython/core/magics/pylab.py:159: UserWarning: pylab import has clobbered these variables: ['nansum', 'colormaps', 'title']
`%matplotlib` prevents importing * from pylab and numpy
warn("pylab import has clobbered these variables: %s" % clobbered +
Fetch COG metadata to get min/max rescaling values (because the file is stored as float32)
r = httpx.get(
f"{titiler_endpoint}/cog/info",
params = {
"url": url,
}
).json()
bounds = r["bounds"]
# Fetch File Metadata to get min/max rescaling values (because the file is stored as float32)
r = httpx.get(
f"{titiler_endpoint}/cog/statistics",
params = {
"url": url,
}
).json()
print(json.dumps(r, indent=4))
{
"1": {
"min": 0.0,
"max": 255.0,
"mean": 36.94901407469342,
"count": 574080.0,
"sum": 21211690.0,
"std": 48.282133573955264,
"median": 3.0,
"majority": 1.0,
"minority": 246.0,
"unique": 256.0,
"histogram": [
[
330584.0,
54820.0,
67683.0,
57434.0,
30305.0,
14648.0,
9606.0,
5653.0,
2296.0,
1051.0
],
[
0.0,
25.5,
51.0,
76.5,
102.0,
127.5,
153.0,
178.5,
204.0,
229.5,
255.0
]
],
"valid_percent": 93.75,
"masked_pixels": 38272.0,
"valid_pixels": 574080.0,
"percentile_98": 171.0,
"percentile_2": 0.0
},
"2": {
"min": 0.0,
"max": 255.0,
"mean": 57.1494356187291,
"count": 574080.0,
"sum": 32808348.0,
"std": 56.300819175100656,
"median": 37.0,
"majority": 5.0,
"minority": 0.0,
"unique": 256.0,
"histogram": [
[
271018.0,
34938.0,
54030.0,
69429.0,
70260.0,
32107.0,
29375.0,
9697.0,
2001.0,
1225.0
],
[
0.0,
25.5,
51.0,
76.5,
102.0,
127.5,
153.0,
178.5,
204.0,
229.5,
255.0
]
],
"valid_percent": 93.75,
"masked_pixels": 38272.0,
"valid_pixels": 574080.0,
"percentile_98": 180.0,
"percentile_2": 5.0
},
"3": {
"min": 0.0,
"max": 255.0,
"mean": 51.251764562430324,
"count": 574080.0,
"sum": 29422613.0,
"std": 39.65505035854822,
"median": 36.0,
"majority": 16.0,
"minority": 252.0,
"unique": 254.0,
"histogram": [
[
203263.0,
150865.0,
104882.0,
42645.0,
30652.0,
25382.0,
12434.0,
2397.0,
1097.0,
463.0
],
[
0.0,
25.5,
51.0,
76.5,
102.0,
127.5,
153.0,
178.5,
204.0,
229.5,
255.0
]
],
"valid_percent": 93.75,
"masked_pixels": 38272.0,
"valid_pixels": 574080.0,
"percentile_98": 158.0,
"percentile_2": 14.0
}
}
r = httpx.get(
f"{titiler_endpoint}/cog/tilejson.json",
params = {
"url": url,
}
).json()
m = Map(
location=((bounds[1] + bounds[3]) / 2,(bounds[0] + bounds[2]) / 2),
zoom_start=10
)
tiles_url = r["tiles"][0]
print(f"tiles url {tiles_url}")
aod_layer = TileLayer(
tiles=tiles_url,
opacity=1,
attr="DigitalGlobe OpenData"
)
aod_layer.add_to(m)
m
tiles url https://titiler.xyz/cog/tiles/WebMercatorQuad/{z}/{x}/{y}@1x?url=https%3A%2F%2Fopendata.digitalglobe.com%2Fevents%2Fmauritius-oil-spill%2Fpost-event%2F2020-08-12%2F105001001F1B5B00%2F105001001F1B5B00.tif
You will find extensive support material on the websites for each package. You may find these links particularly useful during the tutorial:
.hvplot()aimee@developmentseed.org